#The Deaths of Ideas in Congress
This repository contains the replication material of the article "The Deaths of Ideas in Congress", to be published in Political Research Quarterly by Jeremy Gelman. 

## Data
1. deathideas_data.csv #Main section-level dataset
2. deathideas_billdata.csv #Main bill-level dataset
3. member_intros_deaths.csv #Dataset used to analyze whether deaths predict introductions
4. House_assignments_103-117 #Stewart's House committee assignment data
5. Senate_assignments_103-117 #Stewart's Senate committee assingment data
6. appropriationsbills.xlsx #List of appropriations bills excluded from the analysis
7. CELHouse93to116.xlsx #House data from the Center for Effective Lawmaking
8. CELSenate93to116.xlsx #Senate data from the Center for Effective Lawmaking
9. bills102-114.csv #Congressional Bills Project Data 
10. institutional_variables #Divided government and gridlock interval variables
11. training_testdata.xlsx #Hand coded training data used to train the decision tree
12. fullbill_list.csv #Full list of congressional bills to be scraped 
13. boilerplate.txt #List of words that begin sections used to identify boilerplate sections
14. cbp_metadata.csv #Congressional Bills Project data used for bill date introductions

## Required Software and Packages
Python (3.9): 
difflib 
nltk
textdistance
os
pandas
numpy
time
sklearn
glob
re
csv

R (4.2.2)
httr
stringr
dplyr

STATA (14 or higher)

## Code
Preprocessing
1. section_scraping.R #Uses fullbill_list.csv file and the API on govinfo.gov to scrape complete bill texts. Each bill is returned as a txt file.
2. section_split.R #Splits bills into sections. Each section is returned as a txt file.
3. section_cleaning.py #Creates csv file that includes raw and processed section texts and removes boilerplate sections.

Decision tree
1. decisiontree_functions.py #Functions for similarity statistics used by decision tree
2. decisiontree.py #Code that trains and tests the decision tree.

Data creation
1. intro_enacted_matches.py #Matches sections introduced and enacted in same congress. Creates a csv file for each Congress with all introduced sections matched to enacted sections.
2. same_congress_matches.py #Matches introduced sections with identical sections in the same congress. Creates a csv file for each Congress.  
3. reintroduced_matches.py #Matches unenacted introduced sections with sections in next congressional term. Creates a csv file for each Congress.
4. deathsofideas_sectionlevel_datacreation.py #Creates the section-level data files used to conduct the statistical analyses (deathideas_data.csv)
5. deathsofideas_billlevel_datacreation.py #Creates the bill-level data files used to conduct the statistical analyses (deathideas_billdata.csv)
5. deathspredict_intros.py #Creates the member-level data file used in the final analysis section of the paper (member_intros_deaths.csv) 

Analysis
1. deathofideas_analysis.do #STATA .do file to run the analyses reported in the paper.